Exploratory Data Analysis of Loan Data from Prosper Barbara Stempien ========================================================
Prosper Marketplace is America’s first peer-to-peer lending marketplace, with over $7 billion in funded loans. Borrowers request personal loans on Prosper and investors (individual or institutional) can fund anywhere from $2,000 to $35,000 per loan request. Investors can consider borrowers’ credit scores, ratings, and histories and the category of the loan. Prosper handles the servicing of the loan and collects and distributes borrower payments and interest back to the loan investors.
Prosper verifies borrowers’ identities and select personal data before funding loans and manages all stages of loan servicing. Prosper’s unsecured personal loans are fully amortized over a period of three or five years, with no pre-payment penalties. Prosper generates revenue by collecting a one-time fee on funded loans from borrowers and assessing an annual loan servicing fee to investors.
Prosper publishes performance statistics on its website and all market data is available to the public for analysis. Prosper loan data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.
We will begin the analysis by looking at the data contained in the data set.
## ListingKey ListingNumber ListingCreationDate
## 1 1021339766868145413AB3B 193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1 1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A 81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A 658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2 909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D 1074836 2013-12-14 08:26:37.093000000
## CreditGrade Term LoanStatus ClosedDate BorrowerAPR BorrowerRate
## 1 C 36 Completed 2009-08-14 00:00:00 0.16516 0.1580
## 2 <NA> 36 Current <NA> 0.12016 0.0920
## 3 HR 36 Completed 2009-12-17 00:00:00 0.28269 0.2750
## 4 <NA> 36 Current <NA> 0.12528 0.0974
## 5 <NA> 36 Current <NA> 0.24614 0.2085
## 6 <NA> 60 Current <NA> 0.15425 0.1314
## LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1 0.1380 NA NA NA
## 2 0.0820 0.07960 0.0249 0.05470
## 3 0.2400 NA NA NA
## 4 0.0874 0.08490 0.0249 0.06000
## 5 0.1985 0.18316 0.0925 0.09066
## 6 0.1214 0.11567 0.0449 0.07077
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1 NA <NA> NA
## 2 6 A 7
## 3 NA <NA> NA
## 4 6 A 9
## 5 3 D 4
## 6 5 B 10
## ListingCategory..numeric. BorrowerState Occupation EmploymentStatus
## 1 0 CO Other Self-employed
## 2 2 CO Professional Employed
## 3 0 GA Other Not available
## 4 16 GA Skilled Labor Employed
## 5 2 MN Executive Employed
## 6 1 NM Professional Employed
## EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1 2 True True
## 2 44 False False
## 3 NA False True
## 4 113 True False
## 5 44 True False
## 6 82 True False
## GroupKey DateCreditPulled
## 1 <NA> 2007-08-26 18:41:46.780000000
## 2 <NA> 2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4 <NA> 2012-10-22 11:02:32
## 5 <NA> 2013-09-14 18:38:44
## 6 <NA> 2013-12-14 08:26:40
## CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1 640 659 2001-10-11 00:00:00
## 2 680 699 1996-03-18 00:00:00
## 3 480 499 2002-07-27 00:00:00
## 4 800 819 1983-02-28 00:00:00
## 5 680 699 2004-02-20 00:00:00
## 6 740 759 1973-03-01 00:00:00
## CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1 5 4 12
## 2 14 14 29
## 3 NA NA 3
## 4 5 5 29
## 5 19 19 49
## 6 21 17 49
## OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1 1 24 3
## 2 13 389 3
## 3 0 0 0
## 4 7 115 0
## 5 6 220 1
## 6 13 1410 0
## TotalInquiries CurrentDelinquencies AmountDelinquent
## 1 3 2 472
## 2 5 0 0
## 3 1 1 NA
## 4 1 4 10056
## 5 9 0 0
## 6 2 0 0
## DelinquenciesLast7Years PublicRecordsLast10Years
## 1 4 0
## 2 0 1
## 3 0 0
## 4 14 0
## 5 0 0
## 6 0 0
## PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1 0 0 0.00
## 2 0 3989 0.21
## 3 NA NA NA
## 4 0 1444 0.04
## 5 0 6193 0.81
## 6 0 62999 0.39
## AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1 1500 11 0.81
## 2 10266 29 1.00
## 3 NA NA NA
## 4 30754 26 0.76
## 5 695 39 0.95
## 6 86509 47 1.00
## TradesOpenedLast6Months DebtToIncomeRatio IncomeRange
## 1 0 0.17 $25,000-49,999
## 2 2 0.18 $50,000-74,999
## 3 NA 0.06 Not displayed
## 4 0 0.15 $25,000-49,999
## 5 2 0.26 $100,000+
## 6 0 0.36 $100,000+
## IncomeVerifiable StatedMonthlyIncome LoanKey
## 1 True 3083.333 E33A3400205839220442E84
## 2 True 6125.000 9E3B37071505919926B1D82
## 3 True 2083.333 6954337960046817851BCB2
## 4 True 2875.000 A0393664465886295619C51
## 5 True 9583.333 A180369302188889200689E
## 6 True 8333.333 C3D63702273952547E79520
## TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 1 11 11
## 6 NA NA NA
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 0 0
## 6 NA NA
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 11000 9947.9
## 6 NA NA
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1 NA 0
## 2 NA 0
## 3 NA 0
## 4 NA 0
## 5 NA 0
## 6 NA 0
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1 NA 78 19141
## 2 NA 0 134815
## 3 NA 86 6466
## 4 NA 16 77296
## 5 NA 6 102670
## 6 NA 3 123257
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1 9425 2007-09-12 00:00:00 Q3 2007
## 2 10000 2014-03-03 00:00:00 Q1 2014
## 3 3001 2007-01-17 00:00:00 Q1 2007
## 4 10000 2012-11-01 00:00:00 Q4 2012
## 5 15000 2013-09-20 00:00:00 Q3 2013
## 6 15000 2013-12-24 00:00:00 Q4 2013
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA 330.43 11396.14
## 2 1D13370546739025387B2F4 318.93 0.00
## 3 5F7033715035555618FA612 123.32 4186.63
## 4 9ADE356069835475068C6D2 321.45 5143.20
## 5 36CE356043264555721F06C 563.97 2819.85
## 6 874A3701157341738DE458F 342.37 679.34
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1 9425.00 1971.14 -133.18
## 2 0.00 0.00 0.00
## 3 3001.00 1185.63 -24.20
## 4 4091.09 1052.11 -108.01
## 5 1563.22 1256.63 -60.27
## 6 351.89 327.45 -25.33
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1 0 1 0
## 2 0 1 0
## 3 0 1 0
## 4 0 1 0
## 5 0 1 0
## 6 0 1 0
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1 0 0 258
## 2 0 0 1
## 3 0 0 41
## 4 0 0 158
## 5 0 0 20
## 6 0 0 1
## LoanStatus ListingCreationDate ClosedDate
## Current :56576 Min. :2005-11-09 Min. :2005-11-25
## Completed :38074 1st Qu.:2008-09-19 1st Qu.:2009-07-14
## Charged Off :11992 Median :2012-06-16 Median :2011-04-05
## Defaulted : 5018 Mean :2011-07-08 Mean :2011-03-07
## Past Due : 2067 3rd Qu.:2013-09-09 3rd Qu.:2013-01-30
## Final Payment: 205 Max. :2014-03-10 Max. :2014-03-10
## Cancelled : 5 NA's :58848
## ListingCategory..numeric. ListingCategory Term
## Min. : 0.000 Debt Consolidation:58308 12: 1614
## 1st Qu.: 1.000 Not Available :16965 36:87778
## Median : 1.000 Other :10494 60:24545
## Mean : 2.774 Home Improvement : 7433
## 3rd Qu.: 3.000 Business : 7189
## Max. :20.000 Auto : 2572
## (Other) :10976
## BorrowerRate LoanOriginalAmount MonthlyLoanPayment BorrowerState
## Min. :0.0000 Min. : 1000 Min. : 0.0 Length:113937
## 1st Qu.:0.1340 1st Qu.: 4000 1st Qu.: 131.6 Class :character
## Median :0.1840 Median : 6500 Median : 217.7 Mode :character
## Mean :0.1928 Mean : 8337 Mean : 272.5
## 3rd Qu.:0.2500 3rd Qu.:12000 3rd Qu.: 371.6
## Max. :0.4975 Max. :35000 Max. :2251.5
##
## IsBorrowerHomeowner Occupation
## Mode :logical Other :28617
## FALSE:56459 Professional :13628
## TRUE :57478 Computer Programmer : 4478
## Executive : 4311
## Teacher : 3759
## Administrative Assistant: 3688
## (Other) :55456
## EmploymentStatus StatedMonthlyIncome DebtToIncomeRatio
## Employed :67322 Min. : 0 Min. : 0.0000
## Full-time :26355 1st Qu.: 3200 1st Qu.: 0.1300
## Not available: 7602 Median : 4667 Median : 0.2100
## Self-employed: 6134 Mean : 5608 Mean : 0.2552
## Other : 3806 3rd Qu.: 6825 3rd Qu.: 0.3100
## Part-time : 1088 Max. :1750003 Max. :10.0100
## (Other) : 1630
## ProsperRating..Alpha. ProsperScore OpenCreditLines
## C :18345 0 :29084 Min. : 0.000
## B :15581 4 :12595 1st Qu.: 5.000
## A :14551 6 :12278 Median : 8.000
## D :14274 8 :12053 Mean : 8.642
## E : 9795 7 :10597 3rd Qu.:12.000
## (Other):12307 5 : 9813 Max. :54.000
## NA's :29084 (Other):27517
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.59 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
##
## OpenRevolvingMonthlyPayment CurrentDelinquencies AmountDelinquent
## Min. : 0.0 Min. : 0.0000 Min. : 0.0
## 1st Qu.: 114.0 1st Qu.: 0.0000 1st Qu.: 0.0
## Median : 271.0 Median : 0.0000 Median : 0.0
## Mean : 398.3 Mean : 0.5884 Mean : 918.6
## 3rd Qu.: 525.0 3rd Qu.: 0.0000 3rd Qu.: 0.0
## Max. :14985.0 Max. :83.0000 Max. :463881.0
##
## DelinquenciesLast7Years PublicRecordsLast10Years RevolvingCreditBalance
## Min. : 0.000 Min. : 0.0000 Min. : 0
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 2091
## Median : 0.000 Median : 0.0000 Median : 7593
## Mean : 4.119 Mean : 0.3107 Mean : 16424
## 3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.: 18254
## Max. :99.000 Max. :38.0000 Max. :1435667
##
## BankcardUtilization LenderYield Investors
## Min. :0.0000 Min. :-0.0100 Min. : 1.00
## 1st Qu.:0.2300 1st Qu.: 0.1242 1st Qu.: 2.00
## Median :0.5600 Median : 0.1730 Median : 44.00
## Mean :0.5238 Mean : 0.1827 Mean : 80.48
## 3rd Qu.:0.8200 3rd Qu.: 0.2400 3rd Qu.: 115.00
## Max. :5.9500 Max. : 0.4925 Max. :1189.00
##
First, we want to see when and how many listings were created.
From the graph above, we can see that the number of loans in Prosper has increased over time. In 2008 there was a collapse in the number of listings, however, since 2009, the number has been constantly growing.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
Another interesting feature is the amount of loans. In the above graph we can see that most loans are granted for amounts from $4,000 to $12,000. The average loan amount is $8337.
The vast majority of loans is granted for 36 months. Only a fraction of all loans is granted for 12 months.
Consolidation of debts is the most common reason for a loan. We do not have information about the category for a large number of loans - 16,000. The second most common category is Other, followed by Home Improvement and Business.
Most of the loans in the dataset are currently active (56,000). Over 38,000, was paid off. Around 16 thousand were charged off or defaulted.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
As for the borrower’s rate, it is 0.1928 on average. Although a considerable amount of loans is granted at the borrower’s rate of 0.14, 0.15, 0.18, a lot of loans are granted at a much higher rate, for instance 0.32 (5914 loans).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
The average loan installment is $272. Most loan installments are in the $130 - $370 range.
The largest number of loans is granted to borrowers living in the California.
The share of homeowners in the total number of borrowers is around 50%.
The vast majority of borrowers are currently employed. Only a fraction of borrowers do not have a job or work part-time.
Unfortunately, for the majority of loans, there is no information about the borrower’s profession. A significant part of borrowers are professionals, programmers or executives.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
The average monthly income of a borrower is $5608. The vast majority of borrowers earn in the period between $3200 - $6825.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1300 0.2100 0.2552 0.3100 10.0100
It is interesting to compare income to debt. As we can see in the plot above, the average debt to income ratio is 0.2552 and the vast majority of borrowers are in the range of 0.1300 - 0.3100.
Another important aspect when assessing credit risk is the number of delinquencies that the borrower had in recent years. As we can see in the plot above, most borrowers did not have any delinquencies in the last 7 years.
We see similar situations when it comes to public records from the last 10 years. For a significant number of borrowers, it is 0.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2300 0.5600 0.5238 0.8200 5.9500
Utilization of a credit card is also an important factor in risk assessment. As we can see in the plot above, the average credit card utilization is 52%, and most borrowers used between 23% and 82% of the available resources on the credit card.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 2091 7593 16424 18254 1435667
Another value that can help us analyze the financial status of the borrower is revelving credit balance. As we can see, the average value is $16,424, and most borrowers are in the $2091 - $18254 range.
Based on these data, and several other factors, Prosper assessed the credit risk associated with each borrower. As we can see on the graph above, the most common risk is C, then B. For a large number of borrowers, we do not have any information about the risk, as this information was not collected before 2009.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 44.00 80.48 115.00 1189.00
The average number of investors per loan is 80. Most of the loans have from 1 to 115 investors. The highest number of investors per loan was 1189.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
The lender yield is the interest rate minus the expected fee payments. It is the most important input into any return calculation. As we can see, lender yield is 0.1827 on average. Most of the laons range from 0.1242 to 0.2400.
## LoanStatus ListingCreationDate ClosedDate
## Current :56576 Min. :2005-11-09 Min. :2005-11-25
## Completed :38074 1st Qu.:2008-09-19 1st Qu.:2009-07-14
## Charged Off :11992 Median :2012-06-16 Median :2011-04-05
## Defaulted : 5018 Mean :2011-07-08 Mean :2011-03-07
## Past Due : 2067 3rd Qu.:2013-09-09 3rd Qu.:2013-01-30
## Final Payment: 205 Max. :2014-03-10 Max. :2014-03-10
## Cancelled : 5 NA's :58848
## ListingCategory..numeric. ListingCategory Term
## Min. : 0.000 Debt Consolidation:58308 12: 1614
## 1st Qu.: 1.000 Not Available :16965 36:87778
## Median : 1.000 Other :10494 60:24545
## Mean : 2.774 Home Improvement : 7433
## 3rd Qu.: 3.000 Business : 7189
## Max. :20.000 Auto : 2572
## (Other) :10976
## BorrowerRate LoanOriginalAmount MonthlyLoanPayment BorrowerState
## Min. :0.0000 Min. : 1000 Min. : 0.0 Length:113937
## 1st Qu.:0.1340 1st Qu.: 4000 1st Qu.: 131.6 Class :character
## Median :0.1840 Median : 6500 Median : 217.7 Mode :character
## Mean :0.1928 Mean : 8337 Mean : 272.5
## 3rd Qu.:0.2500 3rd Qu.:12000 3rd Qu.: 371.6
## Max. :0.4975 Max. :35000 Max. :2251.5
##
## IsBorrowerHomeowner Occupation
## Mode :logical Other :28617
## FALSE:56459 Professional :13628
## TRUE :57478 Computer Programmer : 4478
## Executive : 4311
## Teacher : 3759
## Administrative Assistant: 3688
## (Other) :55456
## EmploymentStatus StatedMonthlyIncome DebtToIncomeRatio
## Employed :67322 Min. : 0 Min. : 0.0000
## Full-time :26355 1st Qu.: 3200 1st Qu.: 0.1300
## Not available: 7602 Median : 4667 Median : 0.2100
## Self-employed: 6134 Mean : 5608 Mean : 0.2552
## Other : 3806 3rd Qu.: 6825 3rd Qu.: 0.3100
## Part-time : 1088 Max. :1750003 Max. :10.0100
## (Other) : 1630
## ProsperRating..Alpha. ProsperScore OpenCreditLines
## C :18345 0 :29084 Min. : 0.000
## B :15581 4 :12595 1st Qu.: 5.000
## A :14551 6 :12278 Median : 8.000
## D :14274 8 :12053 Mean : 8.642
## E : 9795 7 :10597 3rd Qu.:12.000
## (Other):12307 5 : 9813 Max. :54.000
## NA's :29084 (Other):27517
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.59 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
##
## OpenRevolvingMonthlyPayment CurrentDelinquencies AmountDelinquent
## Min. : 0.0 Min. : 0.0000 Min. : 0.0
## 1st Qu.: 114.0 1st Qu.: 0.0000 1st Qu.: 0.0
## Median : 271.0 Median : 0.0000 Median : 0.0
## Mean : 398.3 Mean : 0.5884 Mean : 918.6
## 3rd Qu.: 525.0 3rd Qu.: 0.0000 3rd Qu.: 0.0
## Max. :14985.0 Max. :83.0000 Max. :463881.0
##
## DelinquenciesLast7Years PublicRecordsLast10Years RevolvingCreditBalance
## Min. : 0.000 Min. : 0.0000 Min. : 0
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 2091
## Median : 0.000 Median : 0.0000 Median : 7593
## Mean : 4.119 Mean : 0.3107 Mean : 16424
## 3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.: 18254
## Max. :99.000 Max. :38.0000 Max. :1435667
##
## BankcardUtilization LenderYield Investors
## Min. :0.0000 Min. :-0.0100 Min. : 1.00
## 1st Qu.:0.2300 1st Qu.: 0.1242 1st Qu.: 2.00
## Median :0.5600 Median : 0.1730 Median : 44.00
## Mean :0.5238 Mean : 0.1827 Mean : 80.48
## 3rd Qu.:0.8200 3rd Qu.: 0.2400 3rd Qu.: 115.00
## Max. :5.9500 Max. : 0.4925 Max. :1189.00
##
Prosper loan data set contains 113,937 loans with 81 variables on each loan. I have selected 29 variables from the original dataset for my analysis.
The most interesting for me is to understand what is the profile of typical peer-to-peer borrower, who fails to pay off the loan, What are the demographics and credit characteristics of the borrower who defaults is past due on the credit.
investigation into your feature(s) of interest?
Features like lender yield or number of investors might be helpful to understand which features of the borrower affect the increase of investors’ interest, and therefore are desirable, and which are not.
Yes, I created new Listing Category variable, with textual values, based on the original ListingCategory..numeric. variable.
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?
I have : * grouped various Past Due statuses of the Loan Status variable into one Past Due status, and converted this variable into factor with 7 levels; * converted Listing Creation Date variable to date format; * converted Term to ordered factor with 3 levels; * converted Closed Date variable to date format; * converted Prosper Rating to ordered factor with 8 levels; * converted Prosper Score to ordered factor with 12 levels; * converted State abbreviation to full name and added as new variable; * converted Is Borrower Homeowner to logical type; * replaced NA in Occupation with Not Available; * replaced NA in Employment Status with Not Available;
These changes have been performed to ease plotting.
While building different plots, I have also grouped variables and subset dataset whenever needed.
Correlogram shows that there is a strong correlation (>= 0.7) between:
These correlations are not surprising. In my analysis, I plan to focus on the variables that impact borrowers ability to pay off the credit, therefore Loan Orginal Amount and Laon Monthly Payment will definitely be investigated.
## # A tibble: 7 x 3
## LoanStatus AvgProsperScore Count
## <fct> <dbl> <int>
## 1 Cancelled 0 5
## 2 Defaulted 1.13 5018
## 3 Charged Off 2.40 11992
## 4 Completed 3.38 38074
## 5 Past Due 5.06 2067
## 6 Final Payment 5.75 205
## 7 Current 5.84 56576
Looking at the above plot we can see that the average prosper score is very low for loans that were defaulted or charged off. Interestingly, the prosper score is also low for loans that are already completed and rather high for loans that are past due. The highest prosper score have current loans and those awaiting a final payment.
## # A tibble: 7 x 3
## LoanStatus AvgLoanOriginalAmount Count
## <fct> <dbl> <int>
## 1 Cancelled 1700 5
## 2 Completed 6189. 38074
## 3 Charged Off 6399. 11992
## 4 Defaulted 6487. 5018
## 5 Past Due 8258. 2067
## 6 Final Payment 8346. 205
## 7 Current 10361. 56576
As we can see on the above plot, the average loan amount for charged off and defaulted loans is much lower than for current loans. Interestingly, the average loan amount is aslo lower for completed loans. The average amount of loan for current loans is $10,361.
## # A tibble: 12 x 3
## ProsperScore AvgLoanOriginalAmount Count
## <ord> <dbl> <int>
## 1 0 6159. 29084
## 2 1 4571. 992
## 3 2 5280. 5766
## 4 3 7063. 7642
## 5 4 8402. 12595
## 6 5 8400. 9813
## 7 6 9223. 12278
## 8 7 10097. 10597
## 9 8 10488. 12053
## 10 9 10056. 6911
## 11 10 11743. 4750
## 12 11 14858. 1456
Another interesting observation we can make is about the relation of the loan amount to the borrower’s score. In the above plot (N/A values are removed) we can see that the higher amounts of the loan are granted to borrowers with the highest rating. This is in line with common sense - larger loans are granted to a more reliable borrower, while smaller loans to those with a greater risk of non-repayment.
## # A tibble: 7 x 3
## LoanStatus AvgMonthlyPayment Count
## <fct> <dbl> <int>
## 1 Cancelled 61.5 5
## 2 Completed 219. 38074
## 3 Defaulted 233. 5018
## 4 Charged Off 235. 11992
## 5 Past Due 276. 2067
## 6 Final Payment 298. 205
## 7 Current 320. 56576
As we can see in the above plot, the average monthly payment is the highest for current loans, followed shortly by the loans awaiting final payment. Past due loans are in the third position. We can also see that the average monthly loan payment for completed loans is one of the lowest. This is as expected - smaller loans are easier to pay back while larger loans, that more time and effort.
## # A tibble: 7 x 3
## LoanStatus AvgMonthlyPayment Count
## <fct> <dbl> <int>
## 1 Cancelled 2609. 5
## 2 Defaulted 4367. 5018
## 3 Charged Off 4486. 11992
## 4 Completed 5325. 38074
## 5 Past Due 5367. 2067
## 6 Current 6153. 56576
## 7 Final Payment 6312. 205
As we can see in the above plot, the average stated monthly income is the highest for loans awaiting final payment, followed shortly by the current loans. Past due loans are in the third position, close to Completed loans. Low income is typical for canceled, defaulted and charged-off loans.
It’s not surprising to see that the average debt to income ratio is the highest for defaulted, past due and charged off loans. We can also see it is significantly lower for loans awaiting the final payment and completed.
Tip: As before, summarize what you found in your bivariate explorations here. Use the questions below to guide your discussion.
investigation. How did the feature(s) of interest vary with other features in
the dataset? Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?
An interesting relationship I have observer is that prosper score and loan status are not related as we could expect. Quite a high number of loans with high prosper score is past due. Moreover, the average prospers score for completed loans is much lower than for past due loans.
Another interesting insight is that the average loan amount for borrowers with the lowest score - 0 - is higher than the average loan amount for borrowers with score 1 and score 2. This is surprising, as the risk related to investing in a borrower with score 0 is much higher than investing in a borrower with score 1 or 2.
The strongest relationship I have found was between debt to income ratio and loan status. Borrowers with the high debt to income ratio were more often defaulting, past due or charged off on their loan.
Let’s start from looking at the relation of employment status, debt to income ratio and loan status.
As we can see in the above plot, only a small fraction of the loans belong to the unemployed borrowers. We can see most of the current loans are in the employed and other buckets. Unfortunately, most of the defaulted loans are in the not available bucket. Debt to income ratio is below 1 for most of the loans. We can see however a small pick at the ratio of 10. To get a better understanding of these relations, let’s zoom in to loans with debt to income ratio below the 99 quantile.
Looking at this plot, we can see that there is significantly less Charged off loans in the employed bucket. We can see them however in the full-time bucket. We can also see that defaulted loans are spread rather equally when it comes to the borrower’s debt to income ratio. This is rather surprising, as we would expect borrowers with the highest debt to income ratio the default on their loans more frequently.
Now, let’s see how monthly loan payment relates to employment status and loan status.
As we can see in the above plot, most of the loans have monthly payment below 1500$. The self-employed bucket is quite interesting, as we can see that loans closer to 1000$ then do be charged off or defaulted. To get a better understanding of these relations, let’s zoom in to loans with monthly payment below the 99 quantile.
Monthly loan payment does not seem to have a big impact on the loan status for employed borrowers. We can see only a slight increase in the number of charged off and defaulted loans for the full time and not employed borrowers. For self-employed borrowers, we can see that the number of charged-off or defaulted loans with monthly payment above the $600 is higher than for lower monthly payments.
Another interesting observation we can make is that borrowers who are not employed, retired or work part-time, then to be offered loans with a monthly payment below $400. Majority of loans granted to self-employed borrowers have monthly payment below $600. At the same time, employees and full-time workers get loans with monthly payment below $900.
Let’s now look at the relation of Stated Monthly Income, Monthly Loan Payment and Loan Status.
As we can see, we have some outliers in the Stated Monthly Income, let’s zoom in to 99 quantile.
As we can see for the majority of loans monthly loan payment is below $700 while borrowers stated income is below $10,000. It is interesting to see a lot of charged-off and defaulted loans in the lower left corner of the plot - small monthly payment but a small income. It seems that borrowers with higher income can pay off the loan even if the monthly payment is larger, while borrowers with the low-income struggle to pay off the loan even if the monthly payment is low.
In this plot, we can look closer at the prosper rating, loan original amount and loan status. We can see that most of the completed loans were loans with a small amount. Most of the loans with a high amount are still active. In this plot, we can also clearly see a relation between prosper score and the amount of loan. Borrowers with a low score are granted smaller loans, while those with good rating bigger loans. High Rish and E category borrowers are almost never granted a loan above the $10,000, while we can see that some of the AA, A and B category borrowers have loans above the $25,000.
Lastly, let’s have a look if there is any significant relation between Prosper Rating, Listing Category and Loan Status.
It seems that there is no relation between the listing category and loan status. There is no category that would have significantly more defaulted or charged-off loans.
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest? Were there any interesting or surprising
interactions between features?
I noticed that only a small fraction of the loans belong to the unemployed borrowers, most of the current loans belong to the employed, full-time borrowers. At first glance, we could see that employed borrowers have a significantly less Charged off loans, however, these loans appear frequently for full-time working borrowers. Unfortunately, for most of the defaulted loans, we do not have information on the borrower’s employment status, which limits our investigation.
Another interesting finding is that the debt to income ratio is below 1 for most of the loans. We could also see that defaulted loans are spread rather equally when it comes to the borrower’s debt to income ratio. This is rather surprising, as we would expect borrowers with the highest debt to income ratio the default on their loans more frequently.
When it comes to how monthly loan payment relates to employment status and loan status, we could see that monthly loan payment does not seem to have a big impact on the loan status for employed borrowers. We could see only a slight increase in the number of charged off and defaulted loans for the full time and not employed borrowers. However, for self-employed borrowers, we could see that the number of charged-off or defaulted loans with monthly payment above the $600 is higher than for lower monthly payments.
Another interesting observation we made is that borrowers who are not employed, retired or work part-time, then to be offered loans with a monthly payment below $400. Majority of loans granted to self-employed borrowers have monthly payment below $600. At the same time, employees and full-time workers get loans with monthly payment below $900.
The majority of loans monthly loan payment is below $700 while borrowers stated income is below $10,000. It was interesting to see a lot of charged-off and defaulted loans for borrowers with a small monthly payment and a small income. It seems that borrowers with higher income can pay off the loan even if the monthly payment is larger, while borrowers with the low-income struggle to pay off the loan even if the monthly payment is low.
We could also see that most of the completed loans were loans with a small amount. Most of the loans with a high amount are still active. We could also clearly see a relation between prosper score and the amount of loan. Borrowers with a low score are granted smaller loans, while those with good rating bigger loans. High Rish and E category borrowers are almost never granted a loan above the $10,000, while we can see that some of the AA, A and B category borrowers have loans above the $25,000.
Surprisingly, we did not find any relation between the listing category and loan status. There is no category that would have significantly more defaulted or charged-off loans.
Most of the loans in the dataset are currently active (56,000). Over 38,000, was paid off. Around 16 thousand were charged off or defaulted. Understanding what reasons stay behind the defaulted and charged off loans is very important from the loan safety perspective. Both investors and loan offering companies are highly interested in understanding what factors impact the borrower’s ability to pay off the debt.
In this plot, we can look closer at the stated monthly income, monthly loan payment and loan status. It is interesting to see a lot of charged-off and defaulted loans in the lower left corner of the plot - small monthly payment but a small income. It seems that borrowers with higher income can pay off the loan even if the monthly payment is larger, while borrowers with the low-income struggle to pay off the loan even if the monthly payment is low.
In this plot, we can look closer at the prosper rating, loan original amount and loan status. We can see that most of the completed loans were loans with a small amount. Most of the loans with a high amount are still active. In this plot, we can also clearly see a relation between prosper score and the amount of loan. Borrowers with a low score are granted smaller loans, while those with good rating bigger loans. High Rish and E category borrowers are almost never granted a loan above the $10,000, while we can see that some of the AA, A and B category borrowers have loans above the $25,000.
I have tried to understand if employment status, loan amount, monthly loan payment and monthly income have a significant impact on the borrower’s ability to pay off the loan. The main challenge I have faced with this dataset is that for lots of loans we did not have all information e.g. employment status, listening category was missing. This has significantly limited the investigation and ability do drive conclusions out of data. Gathering more data about the borrowers could help us understand better the relations between loan status and borrower’s profile.